April 14, 2020

Plan for this Week

Part 1: wrap up base R

  1. Finish subsetting (matrices, factors and data.frames)

  2. Subsetting and assignment

Part 2: modern R basics

  1. Visualization in base R

  2. Visualization in ggplot2 (intro)

Peer Review: Grading Rubric

  • Click here and look for the folder with your name. There you will find the two files (Rmd and html) you are provide a peer review for.

  • You must be signed with your edu.pdx account to access the document.

  • Type your peer evaluation in word making reference to the line in the Rmd fileor to the specific homework problem number to which you are alluding to.

Peer Review: Grading Rubric

Again, be constructive and considerate. Use a continuous scale between 0 and 3, using the following as reference:

0 - No homework turned in.

1 - Turned in but low effort, poorly presented with nonfunctional code and ignoring directions.

2 - Decent effort, well presented all code works and followed directions with some minor issues.

3 - Nailed it!

Data Visualization in R

Data Visualization in R before ggplot2

  • Since R is a software explicitly developed to do statistics R it comes with extensive plotting capabilities by default

  • MANY plotting functions are installed in the graphics package, which ships with base R

  • Look into the help files for the functions in this package using library(help = "graphics")

  • Here are a few examples of visualization functions in the graphics package:

plot, lines, points,abline,boxplot, pairs, matplot, barplot, curve, dotchart, pie, rasterImage, coplot, cdplot, mosaicplot, polygon

Data visualization before ggplot2

plot

  • plot is generic function for plotting R objects

  • Functions in R can be designed in such a way that the same function can have a completely different behavior depending on the object it is used with

  • Here is a list with some of the objects it can be applied to

methods(plot)
##  [1] plot,ANY-method     plot,color-method   plot.acf*          
##  [4] plot.data.frame*    plot.decomposed.ts* plot.default       
##  [7] plot.dendrogram*    plot.density*       plot.ecdf          
## [10] plot.factor*        plot.formula*       plot.function      
## [13] plot.ggplot*        plot.gtable*        plot.hcl_palettes* 
## [16] plot.hclust*        plot.histogram*     plot.HoltWinters*  
## [19] plot.isoreg*        plot.lm*            plot.medpolish*    
## [22] plot.mlm*           plot.ppr*           plot.prcomp*       
## [25] plot.princomp*      plot.profile.nls*   plot.R6*           
## [28] plot.raster*        plot.spec*          plot.stepfun       
## [31] plot.stl*           plot.table*         plot.trans*        
## [34] plot.ts             plot.tskernel*      plot.TukeyHSD*     
## see '?methods' for accessing help and source code

Data visualization before ggplot2

plot

The default form of the function generates scatterplots

Its general structure is plot(x, y, ...), where

  • x is the variable in the x axis,

  • y is the variable in the y axis, and

  • ... represents other graphical parameters (see ?par for an extensive list)

Let’s do an example to see some of the options

Data visualization before ggplot2

Example: the cars dataset

Let’s load the built-in data cars, which loads as a dataframe, a type of object mentioned earlier. Then, we can look at it in a couple different ways.

data(cars) loads this dataframe into the Global Environment as a promise. Promises are unevaluated arguments.

str(cars, 5) 
## 'data.frame':    50 obs. of  2 variables:
##  $ speed: num  4 4 7 7 8 9 10 10 10 11 ...
##  $ dist : num  2 10 4 22 16 10 18 26 34 17 ...

Data visualization before ggplot2

Example: the cars dataset

head(cars,4) # prints first 4 rows
##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
summary(cars) # summary stats for each var
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Data visualization before ggplot2

plot options

plot(x = cars$speed, y = cars$dist,
     xlab = "Speed (mph)",
     ylab = "Stopping distance (ft)",
     main = "Speeds and stopping distances of cars",
     type = "p",
     lty = 1, lwd = 1,
     pch = 16,
     cex = 1, cex.axis = 1, cex.lab = 1,
     col = "firebrick") 

Data visualization before ggplot2

plot options

Data visualization before ggplot2

plot options: type

Data visualization before ggplot2

plot options: cex

obj.size <- seq(0.5,5,length.out = length(cars$speed))

plot(x = cars$speed, y = cars$dist,
     xlab = "Speed (mph)",
     ylab = "Stopping distance (ft)",
     main = "Speeds and stopping distances of cars",
     type = "p",
     lty = 1, lwd = 1,
     pch = 16,
     cex = obj.size, cex.axis = 1, cex.lab = 1,
     col = "firebrick") 

Data visualization before ggplot2

plot options: cex

Data visualization before ggplot2

plot options: col

col.vec <- rep(c("firebrick","forestgreen","cornflowerblue"),
               times=c(sum(cars$speed<10), 
                       sum(cars$speed>=10&cars$speed<17),
                       sum(cars$speed>=17)))

plot(x = cars$speed, y = cars$dist,
     xlab = "Speed (mph)",
     ylab = "Stopping distance (ft)",
     main = "Speeds and stopping distances of cars",
     type = "p",
     lty = 1, lwd = 1,
     pch = 16,
     cex = 1, cex.axis = 1, cex.lab = 1,
     col = col.vec) 

Data visualization before ggplot2

plot options: col

Data visualization before ggplot2

Adding a legend

plot(x = cars$speed, y = cars$dist,
     xlab = "Speed (mph)",
     ylab = "Stopping distance (ft)",
     main = "Speeds and stopping distances of cars",
     type = "p",
     lty = 1, lwd = 1,
     pch = 16,
     cex = 1, cex.axis = 1, cex.lab = 1,
     col = col.vec) 

legend("topleft",
       bty = "n",
       pch=c(16,16,16),
       col=c("firebrick","forestgreen","cornflowerblue"),
       legend=c("speed<10","10<=speed<17","speed>=17")) 

Data visualization before ggplot2

Adding a legend

In-class exercise 1

Your task is to play with the swiss data set built into R fr 20 mins

  • Use ?swiss to see what things mean in the dataset

  • Go to the in-class exercise Rmd document you started working on Tuesday

  • Load the data using data(swiss)

  • Think of and write down in your Rmd document one or two questions you’d like to explore with these data

  • Use the function plot to explore your questions and make 2 or 3 nicely formatted plots with with the options we discussed so far (include legends, play with col, cex, type)

Data visualization before ggplot2

Other plots: boxplot

par(mfrow=c(1,2),mai=c(1,0.5,0.1,0.1))
boxplot(decrease ~ treatment, data = OrchardSprays, col = "cornflowerblue",
        log = "y",cex.axis=0.7,cex.lab=0.7,notch=F)
## horizontal=TRUE, switching  y <--> x :
boxplot(decrease ~ treatment, data = OrchardSprays, col = "cornflowerblue",
        log = "x", horizontal=TRUE,cex.axis=0.7,cex.lab=0.7,notch=F)

Data visualization before ggplot2

Other plots: curve

par(mfrow=c(1,3),mai=c(0.9,0.4,0.1,0.1))
curve(expr=sin, from=-2*pi, to=2*pi, xname = "t",cex.axis=0.7, cex.lab=0.7)
curve(expr=tan, xname = "t", from=-2*pi, to=2*pi, cex.axis=0.7, cex.lab=0.7)
myfn <- function(t){tan(t)*sin(t)}
curve(expr=myfn, xname = "t", from=-2*pi, to=2*pi, cex.axis=0.7, cex.lab=0.7)

Data visualization before ggplot2

Other plots: hist

par(mfrow=c(1,2),mai=c(1,0.5,0.1,0.1))
x <- rchisq(1000, df = 4)
hist(x, freq = FALSE, ylim = c(0, 0.2),col="orange",main="")
#hist and curve combined
hist(x, freq = FALSE, ylim = c(0, 0.2),col="orange",main="")
lines(density(x,from=0, to=20), col = "blue3", lty = 1, lwd = 3, add = TRUE)
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "add" is not a graphical
## parameter

Data visualization before ggplot2

Other plots: pairs

par(mai=c(0.1,0.1,0.1,0.1))
pairs(iris[1:3], cex=0.5, cex.labels = 1,cex.axis=0.7,
      pch = 21, bg = c("red", "green3", "blue")[unclass(iris$Species)])

Data visualization before ggplot2

Other plots: dotchart

VADeaths
##       Rural Male Rural Female Urban Male Urban Female
## 50-54       11.7          8.7       15.4          8.4
## 55-59       18.1         11.7       24.3         13.6
## 60-64       26.9         20.3       37.0         19.3
## 65-69       41.0         30.9       54.6         35.1
## 70-74       66.0         54.3       71.1         50.0
par(mai=c(0.4,0.1,0.4,0.1))
dotchart(VADeaths, bg = "skyblue",
         cex=0.7, cex.axis=0.1,
         main = "Death Rate VA - 1940")

Data visualization before ggplot2

Other plots: matplot

par(mfrow=c(1,2),mai=c(0.4,0.4,0.1,0.1))
sines <- outer(1:20, 1:4, function(x, y) sin(x / 20 * pi * y))
matplot(sines, pch = 1:4, type = "o", col = rainbow(ncol(sines)),
        cex=0.5, cex.axis=0.7)
matplot(sines, type = "b", pch = 21:23, col = 2:5, bg = 2:5, 
        cex=0.5, cex.axis=0.7, main = "")

Data visualization before ggplot2

Other plots: barplot

par(mfrow=c(1,2),mai=c(0.9,0.4,0.1,0.1))
barplot(GNP ~ Year, data = longley, cex=0.5, cex.axis=0.7, cex.lab=0.7)
barplot(cbind(Employed, Unemployed) ~ Year, data = longley, cex = 0.5,
        cex.axis=0.7,cex.lab=0.7)

Data visualization before ggplot2

Other plots: mosaic

par(mfrow=c(1,1),mai=c(0.9,0.4,1,0.4))
mosaicplot(~ Sex + Age + Survived, data = Titanic, main="",color = TRUE)

Data visualization before ggplot2

Figures with multiple panels: mfrow (or mfcol)

par(mfrow=c(1,3))
plot(x = cars$speed, y = cars$dist,xlab = "", ylab = "", main = "",
     type = "p", pch = 16, col = "firebrick") 
plot(x = cars$speed, y = cars$dist,xlab = "", ylab = "", main = "",
     type = "p", pch = 16, col = "firebrick") 
abline(reg=lm(dist~speed,data=cars),col="forestgreen")
plot(x = cars$speed, y = cars$dist,xlab = "", ylab = "", main = "",
     type = "p", pch = 16, col = "firebrick") 
lines(lowess(cars),col="cornflowerblue")

Data visualization before ggplot2

Figures with multiple panels: mfrow (or mfcol)

Data visualization before ggplot2

Figures with multiple panels: the layout function

layout(mat, 
       widths = rep.int(1, ncol(mat)),
       heights = rep.int(1, nrow(mat)), 
       respect = FALSE)

Data visualization before ggplot2

Figures with multiple panels: the layout function

nf <- layout(matrix(c(2,0,1,3),2,2,byrow = TRUE), widths=c(3,1), heights=c(1,3), TRUE)
layout.show(nf)

Data visualization before ggplot2

Figures with multiple panels: the layout function

In-class exercise 2

  • Extend your exploration of swiss by using 2 or three of the figure types discussed after plot

  • Make a figure with two panels (1 row by two columns) using mfrow

  • Make a figure with 3 different plots using layout